Jewellery Data Set - Customer Segmentation using DBSCAN Algorithm

Importing the required libraries

Read the data set and display key statistics


As seen in above outputs:

There are 505 customer records in the dataset
There are no null values in any of the features
Age and Income are integer data whereas SpendingScore and Savings are float data.
We see the min value of SpendingScore and Savings is "0" which maybe erroneous and needs verification

Verifying "0" values in SpendingScore and Savings Features


Savings = 0

The age of this customer is 17
Given the customer is a minor it is quite possible that he/she has 0 savings and the income/spending is from parents wealth.

SpendingScore = 0

The age of the customer is 86
Given the high age and other dynamics like health, needs etc. related to that age, we can assume that this customer is not spending (at least since few years) and the 0 score is justified


Hence, not dropping these records.


Exploratory Data Analysis

Inference from above plots:

The general trend seen is that with increasing age, the customers spending decreases and savings increases. 
This is also impacted by age, typically the younger generation is spending more and saving less and older ones are spending less and saving more
There are exceptions although to the general trend as said above
We do observe data where higher aged customers are spending more and higher income customers are saving less. These could be age and lifestyle related expenses like health, children education/marriage etc.

Higher the spending score, lower the savings is the trend observed. There seems to be a strong co-relation (although inverse) between these 2 features.

Hence plotting a heatmap to check co-relation score.

As seen in scatter plot and the heatmap, there is certainly a high corelation between SpendingScore and Savings. They are inversely related. We can generalize here that a high spender has low savings and a low spender has high savings.

Hence dropping the Savings feature from the dataset. A spending score of the customer seems to be a better metric to use for segmentation and gathering insights from and hence Savings feature is chosen to drop

Scaling the data

Approach 1: Deriving the minpoints and eps using heuristics

A general heuristics based approach for minpoints is 2 times the number of features.

Using this minpoints value, we can then calculate the nearest neighbour distance and determine the eps value

As seen in the plot above, the initial strong dip is seen at 0.2. Hence setting the eps value as 0.2 for clustering

Clustering using DBSCAN Algorithm

Adding the clusters to dataset and verifying number of clusters

We observe that 6 clusters have been formed.

Given that one of the clusters holds the value -1, presence of outliers can be confirmed

Verifying Presence of Outliers and Removing Them

The 29 outliers have been removed and we see the final set of 5 clusters with number of data points belonging to each cluster in the data above

Cluster Visualization

Clustering Summary

Clustering Insights for Jewellery Business and Indicative Business Proposals:

Cluster 0 (Blue):

These are customers nearing retirement/recently retired. Their income and spending scores are average. These customers can be given offers on jewellery/gold that are medium to long term investment oriented, options to buy in smaller chunks and accumulate, goal oriented investment like kids marriage, education etc.

Cluster 1 (Violet):

These are customers whose age is very high (super senior citizens) and their income is low. Given the age, their needs for jewellery as an ornament or investment (Gold/Silver etc.) is highly unlikely and the income also doesnt support purchase of jewellery. These customers can be considered for giving offers on best rates on sale of their existing jewellery or low interest rate offers on jewellery/gold loans.

Cluster 2 (Pink):

These are customers in the 25-45 years age range with a decent income. However their spending score is low. Given their age they might be investing on house, funds, deposits etc. and must be having a need basis (marriage, special occassions etc.) jewellery purchase plan and hence the low spending score. These customers can be given offers on jewellery/gold that are long term investment oriented, options to buy in smaller chunks and accumulate, goal oriented investment like kids marriage, education etc.

Cluster 3 (Orange):

These are customers whose age is very high (super senior citizens) and their income is also very high. Given their high age, the high income could be a result of a steady business or good returns from investments made at an early age. Given the age and their very low spending score their seems to be no need for purchasing jewellery as an ornament. However, considering these customers are business/investment oriented, they can be considered for giving tailored offers portraying jewellery/gold as a high return investment option.

Cluster 4 (Yellow):

These are young customers with very high income and very high spending score as well. Could be young entrepreneurs/customers in high paying jobs with niche skills. They have a high spending capacity and are comparitively an easier target segment here. They can be offered various offers on jewellery both as an ornament and as an investment.

Approach 2: Deriving Minpoints and EPS using Silhoutte Score

As seen from the silhoutte scores table above, we see the best score at EPS = 0.35 and minpoints=6. Applying the DBSCAN algorithm with these parameters and verifying the clusters

We see there are no outliers here and all data points have been clustered

In the approach of deriving the minpoints and eps using silhoutte score, the key difference we see is that all data points got clustered and none were marked as outliers.

The final clusters look similar to what was observed previously with similar age, spending score and income ranges - we see only minor changes because of increase in eps and min points as compared to heuristics based approach.

Hence the cluster summary and indicative business proposals are also same as furnished earlier.